Diagnosis code assignment: models and evaluation metrics

نویسندگان

  • Adler J. Perotte
  • Rimma Pivovarov
  • Karthik Natarajan
  • Nicole Weiskopf
  • Frank D. Wood
  • Noémie Elhadad
چکیده

BACKGROUND AND OBJECTIVE The volume of healthcare data is growing rapidly with the adoption of health information technology. We focus on automated ICD9 code assignment from discharge summary content and methods for evaluating such assignments. METHODS We study ICD9 diagnosis codes and discharge summaries from the publicly available Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC II) repository. We experiment with two coding approaches: one that treats each ICD9 code independently of each other (flat classifier), and one that leverages the hierarchical nature of ICD9 codes into its modeling (hierarchy-based classifier). We propose novel evaluation metrics, which reflect the distances among gold-standard and predicted codes and their locations in the ICD9 tree. Experimental setup, code for modeling, and evaluation scripts are made available to the research community. RESULTS The hierarchy-based classifier outperforms the flat classifier with F-measures of 39.5% and 27.6%, respectively, when trained on 20,533 documents and tested on 2282 documents. While recall is improved at the expense of precision, our novel evaluation metrics show a more refined assessment: for instance, the hierarchy-based classifier identifies the correct sub-tree of gold-standard codes more often than the flat classifier. Error analysis reveals that gold-standard codes are not perfect, and as such the recall and precision are likely underestimated. CONCLUSIONS Hierarchy-based classification yields better ICD9 coding than flat classification for MIMIC patients. Automated ICD9 coding is an example of a task for which data and tools can be shared and for which the research community can work together to build on shared models and advance the state of the art.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving a multi-objective mixed-model assembly line balancing and sequencing problem

This research addresses the mixed-model assembly line (MMAL) by considering various constraints. In MMALs, several types of products which their similarity is so high are made on an assembly line. As a consequence, it is possible to assemble and make several types of products simultaneously without spending any additional time. The proposed multi-objective model considers the balancing and sequ...

متن کامل

Investigating the Role of Code Smells in Preventive Maintenance

The quest for improving the software quality has given rise to various studies which focus on the enhancement of the quality of software through various processes. Code smells, which are indicators of the software quality have not been put to an extensive study for as to determine their role in the prediction of defects in the software. This study aims to investigate the role of code smells in ...

متن کامل

Election of Diagnosis Codes: Words as Responsible Citizens

Providing computer-aided support for the assignment of diagnosis codes has been approached in numerous ways, often by exploiting free-text fields in patient records. Modeling the ’meaning’ of diagnosis codes through statistical data on co-occurrences of words and assigned codes—using a method known as Random Indexing—has only recently been explored as an interesting, alternative solution. It in...

متن کامل

Evaluation of recommender systems: A multi-criteria decision making approach

The evaluation and selection of recommender systems is a difficult decision making process. This difficulty is partially due to the large diversity of published evaluation criteria in addition to lack of standardized methods of evaluation. As such, a systematic methodology is needed that explicitly considers multiple, possibly conflicting metrics and assists decision makers to evaluate and find...

متن کامل

Using Software Metrics to Evaluate Static Single Assignment Form in GCC

Over the past 20 years, static single assignment form (SSA) has risen to become the compiler intermediate representation of choice. Compiler developers cite many qualitative reasons for choosing SSA. However in this study, we present clear quantitative benefits of SSA, by applying several standard software metrics to compiler intermediate code in both SSA and non-SSA forms. The average complexi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2014